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DETAILED ACTION 

Claim Rejections - 35 USC § 102 

1. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
. use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

2. Claims 1 -5, 8, 11 -14, 17, 19, and 20 are rejected under 35 U.S.C. 102(b) as 
being anticipated by Leonardi etal., (Semantic Indexing if Multimedia Documents). 

As per claim 1, Leonardi et al., teach a method for detecting highlights from 
videos, comprising: 

extracting audio features from the video ("divide the input stream into audio and 
video"; page 46, col.2, lines 39 - 43); 

classifying the audio features as labels (page 47, col.1, lines 9 - 14); 

extracting visual features from the video ("divide the input stream into audio and 
video"; page 46, col.2, lines 39 - 43); 

classifying the visual features as labels ("two-state-HMM classifier"; page 47, 
col.1, lines 38-43); and 

fusing, probabilistically, the audio labels and visual labels to detect highlights in 
the video ("calculated four different performance indices"; page 49, col.1 , lines 3 - 9). 
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As per claim 2, Leonardi et al., further disclose that the video is compressed (" 
MPEG-2"; page 45, col.1 , line 26). 

As per claim 3, Leonardi et al., further disclose that silent features are classified 
according to audio energy and zero cross rate ("extracts a feature vector from the low- 
level acoustic properties of each clip such as zero crossing rate"; page 46, col. 2, lines 
46 - 50). 

As per claim 4, Leonardi et al., further disclose that the audio features are MeL- 
scale frequency cepstrum coefficients (page 46, col. 2, lines 46 - 50). 

As per claim 5, Leonardi et al., further disclose that the audio features are 
MPEG-7 descriptors (page 50, col. 2, line 13). 

As per claim 8, Leonardi et al., further disclose the visual features are based on 
motion activity descriptors ("motion vectors" page 46, col. 2, lines 50 - 53). 

As per claim 1 1 , Leonardi et al., further disclose the motion activity is averaged to 
obtain the visual labels (page 45, col. 2, lines 36 - 38). 

As per claim 12, Leonardi et al., further disclose the visual labels are selected 
from the group consisting of close shot, replay, and zoom (page 46, col. 2, lines 1-12). 
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As per claim 13, Leonardi et al., further disclose the probabilistic fusion uses a 
discrete-observation coupled hidden Markov model (page 47, col.1, lines 31 -41). 

As per claim 14, Leonardi et al., further disclose the discrete-observation coupled 
hidden Markov model includes audio hidden Markov models and visual hidden Markov 
models (page 47, col.1, lines 31 -41). 

As per claim 17, Leonardi et al., further disclose the video is a sport video 
("soccer video"; page 45, col.1, lines 1 and 2). 

As per claims 19, and 20, Leonardi et al., further disclose the audio portion of the 
video is compressed, and the visual portion of the video is compressed ("MPEG-7 
content of audio-visual program"; page 50, col. 2, lines 12 - 20). 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 6, and 7 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Leonardi et al., (Semantic Indexing if Multimedia Documents, April -June 2002), in view 
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Rui et al., (Automatically Extracting Highlights for TV Baseball Programs, Eighth ACM 
International Conference on Multimedia, pp.105 - 1 1 5, 2000) 

As per claim 6, Leonardi et al., do not specifically teach that the audio features 
are classified using Gaussian mixture models. 

Rui et al., teach excited speech classification using Gaussian fitting (section 6.5, 
lines 1 and 2). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
the invention was made to use Gaussian mixture models as taught by Rui et al., in 
Leonardi et al., because that would help better classify the audio signal. 

As per claim 7, Leonardi et al., further disclose that audio labels are selected 
from the group consisting of applause, cheering, and music ("background noise" page 
47, col.1, lines 12-14). 

However, Leonardi et al., do not specifically teach audio labels are selected from 
the group consisting of ball hit, speech with music, male speech and female speech. 

Rui et al., teach classifying audio signals into silence, speech, music, song, and 
mixtures of the above, Baseball hit detection (section 5.2; section 2, paragraph 6, lines 
11, and 12). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to classify the audio signals as taught by Rui et al., in 
Leonardi et al., because that would help better determine the highlights of the soccer 
video. 
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The examiner takes official notice that classifying speech between male speech 
and female speech is well known in the art. One having ordinary skill in the art would 
have found it obvious to classify the audio as male speech and female speech, because 
that would help determine particular scenes of the multimedia documents. 

5. Claims 9, 10, and 15 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Leonardi et al., (Semantic Indexing if Multimedia Documents, April -June 2002), in 
view Wang et al., (Integration of Multimodal Features For Video Scene Classification 
based on HMM, 1/99). 

As per claim 9, Leonardi et al., further disclose that visual features include motion 
vectors ("motion vectors" page 46, col. 2, lines 50 - 53). 

However, Leonardi et al., do not specifically teach that visual features include 
dominant color. 

Wang et aL, teach visual features include the most dominant color (page 54, 
paragraph 2, lines 6, and 7). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to include the most dominant color in visual features as 
taught by Wang et al., in Leonardi et al., because that would help better classify the 
video signal, so that highlights can be found. 

As per claim 10, Leonardi et al., do not specifically teach that the variance of the 
motion activity is quantized to obtain the visual labels. 
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Wang et al., teach that visual features include the most dominant color, the most 
dominant notion vectors, and the mean and variance of motion vector. We quantize the 
colors of each video frame into 64 colors adaptively (page 54, paragraph 2, lines 6 - 9). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to quantize the variance as taught by Wang et al., in 
Leonard! et al., because that would help better classify the video signal, so that 
highlights can be found. 

As per claim 15, Leonardi et al., do not specifically teach that the discrete- 
observation coupled hidden Markov model is generated from a Cartesian product of, 
states of the audio hidden Markov models and the visual hidden Markov models, and a 
Cartesian product of observations of the audio hidden Markov models and the visual 
hidden Markov models. 

Wang et al., teach training an HMM for each of the audio, color, and motion 
modalities separately. The observed sequences of different features are fed into the 
corresponding HMM. The final observation probability is computed as... (page 55, 
paragraph 2). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to calculate Cartesian product of HMMs as taught by 
Wang et al., in Leonardi et al., because that would help determine particular scenes of 
the multimedia documents. 
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6. Claims 16, and 18 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Leonardi et al., (Semantic Indexing if Multimedia Documents, April -June 2002), in 
view of Rui et al., (US PAP 2003/01 03647). 

As per claim 16, Leonardi et al., training the discrete-observation coupled hidden 
Markov model ("training two-state HMM"; page 47, col.1, lines 40, and 41). 

However Leonardi et al. t do not specifically teach training with hand labeled 

videos. 

Rui et al., teach that training set is view -labeled in that each face image is 
manually labeled (paragraph 95). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
the invention was made to manually label videos as taught by Rui et al., in Leonardi et 
al., because that would help better classify the video signals. 

As per claim 18, Leonardi et al., do not specifically teach determining likelihoods 
for the highlights; and thresholding the highlights. 

Rui et al., disclose that multi-cue tracking module includes an observation 
likelihood module (paragraph 109); detecting candidates for new face regions, wherein 
each candidate is a region of the video content that potentially includes a new face. 
Generating a confidence level for each candidate, if the confidence level doe not 
exceed the threshold value, the candidate is discarded (paragraph 41). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
the invention was made to threshold candidate face regions as taught by Rui et al., in 
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Leonardi et al., because that would help determine particular scenes by rejecting non 
relevant scenes. 

Conclusion 

7. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Cabasson et al., (US Patent 6,956,904) teach summarizing videos using motion 
activity descriptors correlated with audio features. 

Pan et al., (US PAP 2004/0017389) teach summarization of soccer video 
content. 

Li et al., (US Patent 7,143,354) teach summarization of baseball video content 

8. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Leonard Saint-Cyr whose telephone number is (571) 

272- 4247. The examiner can normally be reached on Mon- Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is (571)- 

273- 8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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